On-Line Portfolio Selection: A Survey * 

Bin Li and Steven C. H. Hoi^ 

School of Computer Engineering, 
Nanyang Technological University, 
50 Nanyang Avenue, Singapore 639798 

December 11, 2012 

Abstract 

On-line portfolio selection is a fundamental problem in computational finance, 
which has been extensively studied across several research communities, includ- 
ing finance, statistics, artificial intelligence, machine learning, and data mining, 
etc. This article aims to provide a comprehensive survey and a structural under- 
standing of existing on-line portfolio selection techniques in literature. From an 
on-line machine learning perspective, we first formulate on-line portfolio selection 
as an on-line sequential decision problem, and then survey a variety of state-of- 
the-art approaches in literature, which are grouped into several major categories, 
including benchmarks, "FoUow-the- Winner" approaches, "Follow-the-Loser" ap- 
proaches, "Pattern-Matching" based approaches, and meta-leaming algorithms. In 
addition to the problem formulation and related algorithms, we also discuss the 
relationship of these algorithms with the Capital Growth theory in order to better 
understand the commons and differences of their underlying trading ideas. This 
article aims to provide a timely and comprehensive survey for both machine learn- 
ing and data mining researchers in academia and quantitative portfolio managers 
in financial industry to help them understand the state of the art and facilitate their 
research or practical applications. We also discuss some open issues and evaluate 
some emerging new trends for future research directions. 

1 Introduction 

Portfolio selection, aiming to optimize the allocation of wealth across a set of assets, 
is a fundamental research problem in computational finance and a practical engineer- 
ing task in financial engineering. There ar e two majo r schoo ls for investigating this 
resear ch, that is, the Mean Variance Theory iMarkowitz il952 , [l959tl . lMarkowitz et al 



|2000*1 mainly from finance community and the Capital Growth Theory Kelly 1 1956ll . 



[Hakansson and Ziemba 1 1995] primarily originated from information theory. The Mean 
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Variance Theory, widely known in asset management industry, focuses on single-period 
(batch) portfolio selection to trade off portfolio's expected return (mean) and risk (vari- 
ance), which typically decides the optimal portfolios with an investor's preference of 
return or risk. On the other hand, the Capital Growth Theory focuses on multiple-period 
or sequential portfolio selection, aiming to maximize portfolio's expected growth rate, 
or expected log return. While both theories solve the task of portfolio selection, the 
latter is fitted to the "online" scenario, which naturally consists of multiple periods and 
is the focus of this article. 

On-line portfolio selection, which sequentially selects a portfolio over a set of as- 
sets in order to achieve certain targets, is a natural and important task for asset man- 
agement community. Aiming to maximize the cumulative wealth, several categories of 
algorithms have been proposed to solve this task. One category of algorithms, termed 
"Follow-the-Winner", tries to asymptotically achieve the same growth rate (expected 
log return) as that of an optimal strategy, which is often based on the Capital Growth 
Theory. The second category, named "Follow-the-Loser", transfers the wealth from 
winner assets to losers, which seems contradictory to the common sense but empir- 
ically often achieves significantly better performance. Moreover, the third category, 
termed "Pattern-Matching" based approach, tries to predict the next market distribu- 
tion based on a sample of historical data and explicitly optimizes the portfolio based 
on the distribution. While the above three categories are focused on a single strategy 
(class), there are also some other strategies which are focused on combining multi- 
ple strategies (classes), termed as "Meta-Learning Algorithms". As a brief summary. 
Table [T]outlines the list of main algorithms and their references. 

This article conducts a comprehensive survey on the area of on-line portfolio selec- 
tion algorithms according to the above categories. To the best of our knowledge, this 
is the first survey that includes the above three categories and the meta-learning algo- 
rithms as well. Moreover, we are the first to explicitly show the connection between 
the on-line portfolio selection algorithms and the Capital Growth Theory, and illustrate 
their underlying trading ideas. In the following section, we also clarify the scope of 
this article and discuss some related existing surveys in literature. 



1.1 Scope 



In this survey, we focus on discussing the empirical motivating ideas of the on-line 
portfolio selection algorithms , inste ad of analyzing theoretica l aspects (such as com- 
petitive analysis by |E1- Yanivllll998ll and iBorodin et al. I II2OOOI1 and asymptotical con- 
vergence analysis by iGyorfi et alJ ll2012ll ). In the following, we discuss some scopes 
which will not be covered. 

First of all, it is important to mention that the "Portfolio Selection" task in our 
survey differs f rom a grea t body of fina ncial engineering studies Kimoto et al. [ 1 99311. 



Merhav and Feder [ 1998ll , ICao and Tavi 1.2003.1 , .Lu et al. [20091 , Dhai- [201 U , Huang et al 



1 201 111 , which attempted to forecast financi al time series by apply i ng machine learning 
techniques and conduct single stock trading Katz and McCormick ['2000'1,'Koolen and Vov^ 
f2012l such a s reinforcement le arning.Moodv et al. [19981. Moodv and Saffell [2001.1. 

O et al.l 0200211. neural networks iKirnoto et al.l 11 1 99311. iDempster et al.l 11200 ill, genetic 

algorithms Mahfoud and Mani 1 1996 j , Allen and Karialainen j 1999ll . Madziuk and Jaruszewicz 



Table 1 : General classifications of the state of the art on-line portfolio selection algo- 
rithms. 



Classifications 


Algorithms 


Representative References 


Benchmarks 


Buy And Hold 
Best Stock 






Constant Rebalanced Portfolios 


Kellv ri9561:Cover ri9911 


Follow-the- Winner 


Universal Portfolios 


Cover [1991] 




Exponential Gradient 


Helmboldetal.1996. 1998 




Follow the Leader 


Gaivoronski and Stella [20001 




Follow the Regularized Leader 


Agarwal et al. [2006] 




Aggregating-type Algorithms 


Vovk and Watkins [1998] 


FoUow-the-Loser 


Anti Correlation 


Borodin et al.2003. 2004 




Passive Aggressive Mean Reversion 


Li et al. [2012] 




Confidence Weighted Mean Reversion 


Li etal.2011a. 2013 




On-Line Moving Average Reversion 


Li and Hoi [2012] 


Pattern-Matching based 


Nonparametric Histogram Log-optimal Strategy 


Gvorfi et al. [2006] 


Approach 


Nonparametric Kernel-based Log-optimal Strategy 
Nonparametric Nearest Neighbor Log-optimal Strategy 


Gvorfi et al. [20061 
Gvorfi et al. [20081 




Correlation-driven Nonparametric Learning Strategy 


Li etal. [201 Ibl 




Nonparametric Kernel-based Semi-log-optimal Strategy 


Gvorfi et al. [20071 




Nonparametric Kernel-based Markowitz-type Strategy 


Ottucsak and Vaida [2007] 




Nonparametric Kernel-based GV-type Strategy 


Gvorfi and Vaida [2008] 


Meta-Learning Algorithms 


Aggregating Algorithm 


Vovk [1990] 1998 




Fast Universalization Algorithm 


Akcoalu et al.2002, 2004 




Online Gradient Updates 


Das and Baneriee [20 HJ 




Online Newton Updates 


Das and Baneriee [201 11 




Follow the Leading History 


Hazan and Seshadhri [20091 
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decision trees iTsang et al . ' (2004"], 



1 2011 

l2002I.ICaoandTav' 1*2003' 



and support vector machines iTav and Cao 
'Lu et al. 1 20091, boosting and expert weighting Creamei 
1 200711 . ICreamer and Freund |2007, 2010], Creamer [2012], etc. The key difference 
between these existing work and ours is that their learning go al is to make explicit 
predictions of future prices/trends and to trade on a single asset teorodin etalll2000i 
Section 6], while our task goal is to directly optimize the allocation among a set of 
assets without involving explicit price predictions. 

Second, this survey emphasizes the importance of "On-Line" decision for portfolio 
selection, meaning that related market information arrives sequentially and the alloca- 
tion decision must be made immediately. Due to the sequential (on-line) nature of this 
task, we mainly focus on the survey of multi-period/sequential portfolio selection work, 
in whi ch the portfoU o is rebalanced to a specified allocation at the end of each trading 
period lCoverl il99lll . and the goal typically is to maximize the expected log return over 
a sequence of trading periods. We note that these work can be conne cted to the Ca pital 
Growth Theory Kelly |1956], stemme d from the seminal paper o f iKellvj lll956|| and 



further develop ed by Breimar l960, 1961, Hakanssor 1970, 1971, Thorr'1969, 19711 

Bell and CoveJ 119801. Finkelstein and Whitlev 1 1981], Al goet and Cover. 1,19 881. Barr on and Cover 
il988ll.lMacLean et al.l ll9921. MacLean and Ziemba ] 1999ll. lZiemba and Zie mba ]20ol^ 



Maclean et 
1969, 



;t alJ I 



I 2OIOI1 . etc. It has been successfully appl ied to gam bling Thorril962, 
J997 1 sports be tting iHausch et al., li 198 1,1 . .Ziemba and Hausch ]1984], Thorp' 
il 99711. IZiem ba and Ha usch 1 2008 ], and po rtfolio investment Thorp and Kassouf ] 1967], 
Rotando and ThorpI il992ll . Ziemba ]2005]. We thus exc lude the studies related to the 
Mean Variance portfolio theorv Markowitzi il952 , 1959ll . which were typic ally devel- 



oped for single-period ( batch) portfolio selection (except some extensions iLi and Ng 



1 200flll . lDaiet"ai]ll2010ll 1 



Finally, this article focuses on surveying the algorithmic aspects and providing a 
structural understanding of the existing on-line portfolio selection strategies. To pre- 
vent loss of focus, we will not dig into the details of the ory works. In liter ature, there 
are a large body of related work for the theory studies , MacLean et alj 11201 ill . Int erested 
resear c hers can explore the details of the theory from t wo exhaustive surv eys 
] ,1 9971. M aclean and Ziembal ] 2008 ] , and its history from Poundstone ] 2005 ] and 



ThorpI 



Gvorfi et al 



112012. Chapter 1]. 



1.2 Related Surveys 



In literature, there exist several related surveys in this area, but none of them is com 
prehensive and timely enough for und erstanding the st ate of the art o f on-line port 
folio selection research. For example. lEl-Yanivl lll998[ Section 5] and IBorodin et al 



I 2OOOI1 surveyed the on-line portfoUo selection pr oblem in the fr amework of compet- 
itive analysis. Using our classification in Table [Tj B orodin et a l. mainly surveyed the 
benchmarks and two Follow-the-Winner algorithms, that is. Universal Portfolios and 
Exponential Gradient (refer to the details in Section |32|. Although the competitive 
framework is important for the Follow-the-Winner category, both surveys are out-of- 
date whic h failed to include a number of state-of-the-art algorithms afterward. A recent 
survey bv iGvorfi et al. 1 2012 , Chapter 2] mainly surveyed the Pattern-Matching based 
approaches, i.e., the third category as shown in Table [Tj which does not include the 
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other categories in this area and is thus far from complete. 

1.3 Organization 

The remainder of this article is organized as follows. Section|2]formulates the problem 
of on-line portfolio selection formally and addresses several practical issues. Section[3] 
introduces the state-of-the-art algorithms, including Benchmarks in Section [TTl the 
Follow-the-Winner approaches in Section 13.21 Follow-the-Loser approaches in Sec- 
tion l3.3l Pattern Matching based Approaches in Section|331 and Meta-Learning Algo- 
rithms in Section [331 etc. Section |4] connects the existing algorithms with the Capital 
Growth Theory and also illustrates the essentials of their underlying trading ideas. Sec- 
tion|5]discusses several related open issues, and finally Section|6]concludes this survey 
and outlines some future directions. 

2 Problem Setting 

Consider a financial market with m assets, we invest our wealth over all the assets in the 
market for a sequence of n trading periods. The market price change is represented by 
a m-dimensional price relative vector X( G M™, t = 1, . . . , n, where the i*'* element of 
t*'' price relative vector, xt,i, denotes the ratio of t*'* closing price to last closing price 
for the z*'' assets. Thus, an investment in asset i on period t increases by a factor of xt,i. 
We also denote the market price changes from period ti to t2 (t2 > ti) by a market 
window, which consists of a sequence of price relative vectors x^J = {xt^ , . . . , xt^}, 
where ti denotes the beginning period and t2 denotes the ending period. One special 
market window starts from period 1 to n, that is, x" — {xi , . . . , x„}. 

At the beginning of the t^^ period, an investment is specified by a portfolio vec- 
tor ht,t = 1, . . . , n. The i*^ element of t*'* portfolio, ht^i, represents the propor- 
tion of capital invested in the z*'* asset. Typically, we assume a portfolio is self- 
financed and no margin/short is allowed. Thus, a portfolio satisfies the constraint that 
each entry is non-negative and all entries sum up to one, that is, b( S Am, where 
Am = {b : b ^ 0, b^l = l}, and 1 is the m-dimensional vector of all Is. The in- 
vestment procedure from period 1 to n is represented by a portfolio strategy, which is 
a sequence of mappings as follows: 

bi = -1, ht : M™(*-') ^ Am, t = 2, 3, . . . , n, 
m 

where bj — bf (x^^^) denotes the portfolio learned from the past market window 
x*~^. Let us denote the portfolio strategy for n periods as b" = {bi, . . . , b„}. 

For the t*-^ period, a portfolio manager apportions its capital according to portfolio 
bt at the opening time, and holds the portfolio until the closing time. Thus, the portfolio 
wealth will increase by a factor of b^Xf = Yl^i bt.iXt,i- Since this model uses price 
relatives and re-invests the capital, the portfolio wealth will increase multiplicatively. 
Finally, from period 1 to n, a portfolio strategy b" increases the initial wealth Sq by a 
factor of ri"=i t»7xt, that is, the final cumulative wealth after a sequence of n periods 
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Algorithm 1: On-line portfolio selection framework. 
Input: x": Historical market sequence 
Output: Sn'. Final cumulative wealth 

1 Initialize 5*0 = 1, bi — —) 

2 for t = 1, 2, . . . , n do 

3 Portfolio manager learns a portfolio ; 

4 Market reveals the market price relative Xt ; 

5 Portfolio incurs period return hjxt and updates cumulative return 
St = St-i X (b^xt) ; 

6 Portfolio manager updates his/her on-line portfolio selection rules ; 

7 end 



is 

n n m 

Sr. (b^ ) = 5o n = ^" IE 

t=l t=l i=l 

Since the model assumes multi-period investment, we define the exponential growth 
rate for a strategy b" as, 

1 1 " 

Wn {K) = - log Sn (b^) = - V log bt • X*. 

n n ^-^ 

t=i 

Finally, let us combine all elements and formulate the on-line portfolio selection 
model. In a portfolio selection task the decision maker is a portfolio manager, whose 
goal is to produce a portfolio strategy b" in order to achieve certain targets. Following 
the principle as the algorithms shown in Table [T] our target is to maximize the portfo- 
lio cumulative wealth 5„. The portfolio manager computes the portfolio strategy in a 
sequential fashion. On the beginning of period t, based on previous market window 
x^~^, the portfolio manager learns a new portfolio vector ht for the coming price rela- 
tive vector Xt, where the decision criterion varies among different managers/strategies. 
The portfolio b( is scored using the portfolio period return ht • Xj. This procedure 
is repeated until period n, and the strategy is finally scored according to the portfolio 
cumulative wealth Sn- Algorithm [T] shows the framework of on-line portfolio selec- 
tion, which serves as a general procedure to backtest any on-line portfolio selection 
algorithm. 

In general, some assumptions are made in the above widely adopted model: 

1 . Transaction cost: we assume no transaction costs/taxes in the model; 

2. Market liquidity: we assume that one can buy and sell any quantity at last closing 
prices; 

3. Impact cost: we assume market behavior is not affected by any portfolio selec- 
tion strategy. 
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To better understand the notions and model above, let us illustrate by a classical 
example. 

Example 2.1 (Virtual market by Cover and Gluss 1 1986ll ). Assume a two-asset market 



with one cash and one volatile asset with the price relative sequence x" = {(1,2), (l, ^) ,(1,2),...}. 
The 1** price relative vector xi = (1, 2) means that if we invest $1 in the first asset, you 
will get $1 at the end of period; if we invest $1 in the second asset, we will get $2 after 
the period. Let a fixed proportion portfolio strategy beb" = {{h' l) ■ il- 1) ' ■ ■ ■}' 
which means everyday the manager redistributes the capital equally among the two as- 
sets. For the 1** period, the portfolio wealth increases by a factor of 1x^ + 2x1 = |. 
Initializing the capital with 5*0 — 1, then the capital at the end of the P* period equals 
5-1 = X I = §. Similai-ly, 5*2 = 5i x (l x i + i x i) = § x | = |. Thus, at the 
end of period n, the final cumulative wealth equals, 

[ I X I ^ n is odd 



and the exponential growth rate is 

Wn (K) 

which approaches i log | > if n is sufficiently large 



n IS even 



_ ; 5 log I 

^ log I + i log I n is odd 



2.1 Transaction Cost 

In reaUty, the most important and unavoidable issue is transaction costs. In this sec- 
tion, we model the transaction costs into our formulation, which enables us to evalu- 
ate one on-line portfolio selection algorithms. However, we will not introduce strate- 
gies Payis and N orman |1990], Iyengar and Cover [2000], Akian et al. [2001], SchfeJ 



I 2002ll . iGyorfi and Vajdal i2008ll that directly solve the transaction costs issues. 



Th e widely adopted transaction costs model is proportional transaction costs model Gvorfi and Vaidal 
I 2008ll . which the incurred transaction cost is proportional to the wealth transferred 



during rebalancing. Let the broker charges transaction costs on both buying and sell- 
ing. At the beginning of the t*'* period, the portfolio manager intends to rebalance the 
portfolio from closing price adjusted portfolio ht-i to a new portfolio bj. Here bt_i 
is calculated as, bt^i.i = ''b^^ i.xV^i' ' * ~ 1, . . . , m. Assuming two transaction cost 
rates G (0, 1) and Cg £ (0, 1), where denotes the transaction costs rate incurred 
during buying rebalancing and Cs denotes the transaction costs rate incurred by selling 
rebalancing. After rebalancing St-i will be decomposed into two parts, that is, the 
net wealth Nt-i in the new portfolio b( and the transaction costs incurred during the 
buying and selling. If the wealth on asset i before rebalancing is higher than that after 
reblancing, that is, St~i > bt.iNt-i, then there will be a selhng rebalanc- 

ing. Otherwise, then a buying rebalancing is required. Formally, 

^Vbt-i-Xf_i J bt_i-xt_i , 
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Let use denote transaction costs factor Gvorfi and Vaidal 1 2008 1 as the ratio of net 

fee (0,1). 



wealth after rebalancing to wealth before rebalancing, that is, wt-i 
Dividing above equation by S't_i, we can get. 



1 = wt_i+a 



E 

1=1 



bt,iWt- 



-Cb 



■,.iWt-l 



ht-i ■ Xi_i 



(1) 

Clearly, given ht i, Xf_i, and bt, there exists a unique transaction costs factor for each 
rebalancing. Thus, we can denote Wt-i as a function, Wt-i — w (bt, bf_i, xt„i). 
Moreover, considering the portfolio is in the simplex domain, then the factor ranges 



between 



1^ 



< Wt-l < 1. 



Blum and Kalai Ill999ll considered another proportional transaction cost model, ex- 



cept that they assume that the transaction costs are only incurred during buying rebal- 
ancing. The authors assume a single transaction cost rate c G (0, 1) and the transaction 
costs factor wt-i, then 



1 = Wt-l + c 



1=1 



HAWt-l 



bt-l,iXt-l,i \ 

ht-1 ■ xt_i 



For reasonable transaction costs rate, they eased the calculation by approximating the 
transaction costs factor as. 



Wt-l 



i=l 



ht 1 ■ Xf_i 



(2) 



Besides, they argued that this setting can be assumed without loss of generality as they 
can set c = ct + Cg and $1 in one asset can be rebalanced to > 1 — c in a different 

asset. 

lb empirically evaluate an on-line portfolio selecti on algorithm, Borodin et all2003 



20041 propose a variation from Blum and Kalai II1999II . They assume that for each buy- 
ing and selling, the portfolio manager pays a transaction rate of | and considers an 
approximate transaction costs factor similar to Eq. that is, 

bt-lAXt-l.i 



Wt~l 



1 



IE 



ht I ■ xt_i 



(3) 



For all three models, the final cumulative wealth after n periods equals, 

n 

Sn = SoY\_ X (bt • Xt) , 

where Wt^i depends on the chosen transaction cost model (Eq. ([T]i, dU, or Q). 



3 On-Line Portfolio Selection Approaches 

In this section, we survey the area of on-line portfolio selection. Algorithms in this 
area formulate the on-line portfolio selection task as in Section |2] and derive explicit 
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portfolio update scheme for each period. Basically, the routine is to implicitly assume 
various price relative predictions and learn optimal portfolios for each period. 

In the subsequent sections, we mainly list the algorithms following Table [1] In 
particular, we firstly introduce several benchmark algorithms in Section [TT] Then, we 
introduce the algorithms with explicit update schemes in the subsequent three sections. 
We classifies them based on the directions of the weights transfer. The first approach, 
Follow-the-Winner approach, tries to increase the relative weights of more successful 
experts/stocks, often based on their historical performance. On the contrary, the sec- 
ond approach, FoUow-the-Loser approach, tries to increase the relative weights of less 
successful experts/stocks, or transfer the weights from the winners to losers. The third 
approach, Pattern-Matching based approach, tries to build a portfolio based on some 
sampled similar historical patterns with no explicit weights transfer directions. After 
that, we survey some meta-learning algorithms, which can be applied on higher level 
experts equipped with any existing algorithms. 

3.1 Benchmarks 

3.1.1 Buy And Hold Strategy 

The most common baseUne is Buy-And-Hold (BAH) strategy, that is, one invests wealth 
among a pool of assets with an initial portfolio bi and holds the portfolio until the end. 
The manager only buys the assets at the beginning of the 1*** period and does not re- 
balances in the following periods, while the portfolio holdings are implicitly changed 
following the market fluctuations. For example, at the end of the 1"* period, the portfo- 
lio holding becomes '^^^^^ i where denotes element- wise product. In a summary, 
the final cumulative wealth achieved by a BAH strategy is initial portfolio weighted 
average of individual stocks' final wealth, that is, 

S^{BAH{h^))=h^■ ((g)xt 
\t=i 

The BAH strategy with initial uniform portfolio bi — . . . , ;i) is referred to 
as uniform BAH strategy, which is often adopted as market strategy to produce market 
index. 

3.1.2 Best Stock Strategy 

Another widely adopt benchmark is Best Stock (Best) strategy, which is a special BAH 
strategy that puts all capital on the stock with best performance in hindsight. Clearly, 
its initial portfolio b° in hindsight can be calculated as, 

b° = argmaxb 
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As a result, the final cumulative wealth achieved by a Best strategy can be calculated 

as. 




Sn (Best) = max b • 6?) Xt = 5„ (BAH (b°)) , 



3.1.3 Constant Rebalanced Portfolios 

Another more challenging benchmark strategy is Constant Rebalanced Portfolios (CRP) 
strategy, which rebalances the portfolio to a fixed portfolio b for every period. In par- 
ticular, the portfolio strategy can be represented as b" = {b, b, . . . }. Thus, the cu- 
mulative portfolio wealth achieved by a CRP strategy after n periods is defined as, 



One special CRP strategy that rebalances to uniform portfolio b = . . . , ^) each 
period is named Uniform Constant Rebalanced Portfolios (UCRP). It is possible to 
calculate an optimal offline portfolio for the CRP strategy as, 

n 

b* — arg max log S'„ {CRP (h)) — arg max \J log (b^Xf) , 

which is convex and can be efficiently solved. The CRP strategy with b* is denoted 
as Best Constant Rebalanced Portfolios (BCRP). BCRP achieves a final cumulative 
portfolio wealth and corresponding exponential growth rate defined as follows, 

Sn (BCRP) = max S,, {CRP (b)) = S'„ {CRP {h*)) 

Wn {BCRP) = - log Sn {BCRP) = - log Sn {CRP (b*)) . 
n n 

Note that BCRP strategy is a hindsight strategy, which can only be calculated with 



complete market sequences. Covei 1 199lll proofed the benefits of BCRP as a target. 



that is, BCRP exceeds best stock. Value Line Index (geometric mean of component 
returns) and Dow Jones Index (arithmetic mean of component returns, or BAH). More- 
over, BCRP is invariant under permutations of the price relative sequences, that is, it 
does not depend on the order in which xi , X2 , . . . , x„ occur 

Till now let us compare BAH and CRP strategy by continuing the Example 12. II 



Example 3.1 (Virtual market by Cover and Gluss 1 1986ll ). Assume a two-asset market 



with one cash and one volatile asset with the price relative sequence x" — |(l,2),(l,i) ,(1,2),...}. 

Let us consider the BAH with uniform initial portfolio bi = (i, i) and the CRP with 

uniform portfolio b = (5, ^). Clearly, since no asset grows in the long run, the final 

wealth of BAH equals the uniform weighted summation of two assets, which roughly 

equals to 1 in the long run. On the other side, according to the analysis of Example |2.1[ 

the final cumulative wealth of CRP is roughly | ^ , which increases exponentially. Note 
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that the BAH only rebalances on the period, while the CRP rebalances every period. 
On the same virtual market, while market provides no return and CRP can produce an 
exponentially increasing return. The underlying idea of CRP is to take adva ntage of 
the underlying volatility, or so-called "volatility pumping" | Luenberger 19981 Chapter 
15]. 

Since CRP rebalances a fi xed portfolio each period, it s frequent transactions will 



incur high transaction costs. iHelmbold et al.lll996l 1 19981 proposed a Semi-Constant 



Rebalanced Portfolios (Semi-CRP), which rebalances the portfolio only on selected 
periods rather than every period. 

One des ired theoretic al result for on-line portfolio selection is the "universality" 
proposed bv lCoverl 11 199 ill . An online portfolio selecti o n algorithm Alq is univers al if 
the average (external) reiere nStoltz and Lugosil ||2005|, iBlum and Mansoun ll2007ll for 
n period asymptotically approaches 0, that is. 



iregret„ (Alg) - Wn (BCRP) - W„ (Alg) 



0. 



(4) 



In other words, a universal portfolio selection algorithm asymptotically approaches 
the same exponential growth rate as BCRP strategy for arbitrary sequences of price 
relatives. 



3.2 FoUow-the-Winner Approaches 

The first approach, Follow-the-Winner, is characterized by increasing the relative weights 
of more successful experts/stocks. Rather than targeting market and best stock, algo- 
rithms i n this category often aim to track BCRP strategy, the optimal strategy in an i.i.d. 



market fCover and Thomas , 1991 , Theorem 15.3.1]. On other words, an algorithm in 



this category aims to be universal portfolio selection algorithm. 
3.2.1 Universal Portfolios 

The basic idea of Universal Portfolios-type algorithms is to assign the capital to a single 
class of base experts, let the experts run, and finally pool their wealth. Strategies in this 
type are analogous to the Buy And Hold (BAH) strategy. Their difference is that base 
BAH expert is the strategy investing on a single stock and thus the number of experts is 
the same as that of stocks. In other words, BAH strategy buys the individual stocks and 
lets the stocks go and finally pools their individual wealth. On the other hand, the base 
expert in the Follow-the-Winner category can be any strategy class that invests over 
the whole markets. Besides, algorithms in this category is also similar to the Meta- 
Learning Algorithms (MLA) further described in Section 13.51 while MLA generally 
ap phes to expert s of multiple classes. 



j Coveri 11199 in proposed the Universal Portfolios (UP) strategy and lCover and Thomas 
1 199 ill also further analyzed it. Formally, its update scheme is the historical perfor- 



mance weighed average of all possible constant rebalanced portfolios, that is, 
1 i\ _ hSt{h)db 



bi = 



m' 'mj^ St{h)dh 
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In detail, the initial portfolio bi is uniform over the market, and portfolio for the t+V* 
period is the historical performance weighted average of all CRP experts with b e A,,, . 
Thus, the final cumulative wealth achieved by Cover's UP is uniform weighted average 
of CRP experts' wealth, 



Sn (UP) = f Sn (b) dh. 



Intuitively, Cover's UP operates similar to a Fund of Fund (FOF), and its main idea 
is to buy and hold the parameterized CRP strategies over the simplex domain. In partic- 
ular, it gives an initial proportion of wealth dh/ dh to each portfolio manager op- 
erated by one CRP strategy with b e A„i. Then the managers make their final wealth 
Sn (b) = e"'^"*^'^^c/b individually at an exponential rate of W (b). Finally, Cover's UP 
pools the wealth at the end resulting in a terminal wealth of Sn (UP). Alternatively, if 
a loss function is defined as negative logarithmic function o f portfolio return. Cover's 
UP is actually an exponentially weighted average forecaster ICesa-Bianchi and Lugosi 
I 2OO6II . 

It is well known that under suitable smoothness conditions, the average of expo- 
nentials has the same exponential growth rate as that of the maximum, one can asymp- 
totically achieve the same exponential growth rate as that of BCRP The regret achieved 
by Cover's UP is O (m log n), and its time complexity is O (ri™), where m denotes the 
number of stocks and n refers to the number of periods. 

As Cover's UP is based on an ideal market model, one research topic with re- 
spect to Cover's UP i s to e xtend the algorithm with various realistic assumptions. 



Cover and Ordenthch ||1996'] considered side information, that is, experts' opinions, 
fundamental data, etc. [Cover and Ordentli chI 1 1998 1 extended the algorithm with short 



selling and margin, and Blu m and Kalail ll 199911 took account of transaction costs 

Another research topic is to generalize C over's UP with dif ferent underlying base 
expert classes, rather than the CRP strategy. iJamshidian II1992I1 generalized the algo- 
rith m for continuous time ma rket and presented the long-term per formance of C over's 
UP. IVovk and WatkinsI 0199811 applied aggregating algorithm (AA^I Vovkllll990ll to a fi- 
nite number of arbitrary investment strategies. Cover's UP becomes a specialized case 
of AA when appli ed to an infinite number of CRPs. We will further investigate AA 
in Section [123] Ordenthch and Coved 1199 811 analyzed the minimal ratio of the final 
wealth achieved by any non-anticipating investment s trategy to that of BCRP s trategy 
and provided a strategy to achieve such optimal ratio. ICross and BarronI 1 2003 1 gener- 
alized Cover's UP from CRP strategy class to an y parameterized target class and pro- 
posed a computation favorable universal strategy. lAkcoglu et al.ll2002 , I2OO4I extended 
Cover's UP from the parameterized CRP class to a wide class of investment strategies, 
including trading strategies ope rating on a s ingle stock and portfolio strategies operat- 
ing on the whole stock market. iKozat and Singer |2011] proposed a s imilar universal 



algorithm based on the class of semi-constant rebalanced portfolios iHelmbold et al 



II 1996t. 1998.1 . which provides good performance with transaction costs. 

Rather than the intuitive analysis, several work has also bee n proposed to dis - 
cuss the connection between Cover's UP with universal prediction Feder et a l. 1 1992], 
data compression .Rissanen 1 1983 1 and Markowitz's mean-variance theory Markowitj 
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11951 1959ll . lAlgoet il992ll discussed the universal schemes f or prediction, gambling 
and portfolio selection. ICoverl 11199611 and lOrdentlich 11996'] discussed the connec- 
tion of universal portfolio selection and data compression. Belentepe [2005] presented 
a statistical view of Cover's U P strategy and con nect it with traditional Markowitz's 
mean-variance portfolio theory 'Markowitg il952ll . The authors showed that by allow- 
ing short and leverage, UP is approximately equivalent to sequential mean-variance op- 
timization; otherwise the strategy is approximately equivalent to constrained sequential 
optimization. Though its update scheme is distributional free, UP implicitly estimates 
the multivariate mean and covariance matrix. 

Although Cover's UP has a good theoretical performance guarantee, its implemen- 
tation is exponential in the number of stocks, which re stricts its practical capability. 
To handle its computational issue, iKalai and Vempalal |l2002] presented an efficient 
implementation based on non-uniform random walks that are rapidly mixing. Their 
implementation requires a poly running time of O (m^n^), which is greatly improved 
from the original O (n™). 



3.2.2 Exponential Gradient 

The strategies in the Exponential Gradient-type generally focus on the following opti- 
mization problem. 



bt+i=argmax 77 log b • Xf - i? (b, bt) , 

bGA„, 



(5) 



where R (b, bf ) denotes a regularization term and ry > denotes the learning rate. One 
straightforward interpretation of the above optimization is to track the stock with the 
best performance in the last period and keep the previous portfolio information via a 

reg ularization term 

Helmbold etalll996i 11998 proposed the Exponential Gradient (EG ) strategy, which 



is base d on the algorithm proposed for mixture estimation problem iHelmbold et al 



1 199711 . The EG strategy adopts relative entropy as the regularization term, that is, 

b, 



i?(b,bt) =^&,log 



bt,i 



EG's formulation thus is convex in b, however, it is hard to solve since log is nonlinear. 
Thus, the authors adopted the first-order Taylor expansion of log function at bt, that is. 



logb • Xt « log(bt • Xt) 



bf • Xt 



(b-bt) 



which the first term in Eq. (|5]l becomes linear and easy to solve. Solving the optimiza- 
tion, EG's update rule is. 



bt • Xt , 

where Z denotes the normalization term such that the portfolio sums to 1. 
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Besides the multiplicative update rule likes EG algorithm, the optimization prob- 
lem (|5]l can also be solved using the Gradient Projection (GP) and Expectation Maxi- 
mization (EM) method Helmbold et al. 1 1997J. GP and EM adopt different regulariza- 
tion terms. In particular, GP adopts the L2-norm regularization, and EM adopts the 
regularization term, that is, 

^'EtAb^-bt,f GP 



IeT=i^^^ em 



i?(b,bt) 

The final update rule of GP is 

I, u , I ^t-i ^ ST^ 

bt+i.t = bt^i + T] — > - 

y bt • Xi TO ^ bt • xt ^ 

and the update rule of EM is 



bt+i.i = bt.i [ 77 ( 1 

which can also be viewed as the first order approximation of EG' update. 

The regret achieved by EG strategy is O {^/nlog m) with O (m) running time per 
period. The regret is not as tight as that of Cover's UP, however, its linear running 
time substantially surpasses that of Cover's UP. Besides, the authors also proposed an 
variants, which has a regret bound of O (to log n). Thoug h not proposed for on-line 



portfolio selection task, according to lHehnbold et al.l lll997 l. GP can straightforwardly 



achieve a regret of O {y/nm), which is significantly worse than that of EG. 

One key parameter for EG is the learning rate rj > 0. In order to achieve the regret 
bound above, has to be small. However, as — > 0, its update approaches uniform 
por tfolio, which degrades to UCRP 



Das and Baneried 11201 lH extended EG algorithm to the sense of meta-learning al- 
gorithm named Online Gradient Updates (OGU), which combines underlying experts 
such that the overall system can achieve the performance that is no worse than any 
convex combination of the underlying base experts. We will introduce OGU in Sec- 
tion [333] 



3.2.3 Follow the Leader 

Strategies in the Follow the Leader (FTL) approach try to track the Best Constant Re- 
balanced Portfolio (BCRP) until time t, that is, 

t 

bt+i = b* = arg max V log (b • x^ ) . (6) 

beA,„ 

Clearly, this category follows the BCRP leader, and the ultimate leader is the BCRP 
over the whole periods. 
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Ordentlich lfl996[ Chapter 4.4] briefly mentioned a strategy to obtain portfolios by 



mixing the BCRP so far and uniform portfolio, 

1 1 



H+l 



-h; + -—-i. 



t+l ' t+lm 

He also showed its worst c a se bou nd, which is slight worse than that of Cover's UP. 



Gaivoronski and Stella II2OOOI1 proposed Successive Constant Rebalanced Portfo- 



lios (SCRP) and Weighted Successive Constant Rebalanced Portfolios (WSCRP) for 
stationaiy markets. For each period, SCRP directly adopts the BCRP portfoho until 
now, that is, 

bt+i = bj . 



The a uthors further solved the optimal portfolio h * via the stocha stic optimization lBirge and Louveaux 



1 19931, resulting in the detail updates of SCRP IIGaivoronski and Stella, 20001 Algo 



rithm 1]. On the other hand, WSCRP outputs a convex combination of SCRP portfoho 
and last portfolio, 

ht+i = (l-7)b* +7bt, 

where 7 e [0, 1] represents the trade-off parameter. 

The regret bounds achieved by SCRP and WSCRP are both O (m log n), which is 
the same as Cover's UP. 

Rather than assuming that historical market is stationary, some algo rithms assume 
that historical market is non-stationary. IGaivoronski and Stella II2OOOII propose Vari- 



able Rebalanced Portfolios (VRP), which calculates the BCRP portfolio based on a 
latest sliding window. To be more specific, VRP updates its portfoho as follows. 



3t+i = argmax ^ log (b • Xi 



where W denotes a specified window size. Following their algorithms for Constance 
Rebalanced Portfolios (CRP), they further proposed Successive Variable Rebalanced 
Portfolios (SVRP) and Weighted Successive Variable Rebalanced Portfolios (WSVRP). 

No theoretical result were given o n the two algorithms. 

Gaivoronski and Stellal 11200311 further generalized IGaivoronski and Stella II2OOOII 



and proposed Adaptive Portfolio Selection (APS) for on-line portfolio selection task. 
By changing the objective part, APS can handle three types of portfolio selection task, 
that is, adaptive Markowitz portfolio, log-optimal constant rebalanced portfolio, and 
index tracking. To handle the transaction cost issue, they proposed Threshold Portfo- 
lio Selection (TPS), which only rebalances the portfolio if the expected return of new 
portfolio exceeds that of previous portfolio for more than a threshold. 

3.2.4 Follow the Regularized Leader 

Another category of approaches follows the similar idea as FTL, while adds a regular- 
ization term, thus actually becomes Follow the Regularized Leader (FTRL) approach. 
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In generally, FTRL approaches can be formulated as follows, 



t 

= argmaxy^ log (b ■ 

beA„ 



T=l 



fi?(b), 



(7) 



where /3 denotes the trade-off parameter and R (b) is a regularization term on b. Note 
here all historical information is adopted in the first term, thus the regularization term 
only concerns about next portfolio, which is different from that of EG algorithm. One 
typ ically regularization i s L2-norm, that is, R (b) = ||b||^. 



Agarwal et al 



[2006] proposed the Online Newton Step (ONS), by solving the op- 
timization problem with L2-norm regulariza t ion via on-line convex optimizatio n 
technique Izinkevichni200l . iHazan et alj ll2006ll . iHazanl jiooell . iHazan etal] ||2007 I. 
Similar to Newton method for offline optimization, the basic idea is to replace the log 
term via its second-order Taylor expansion at bf , and then solve the problem for close- 
form update scheme. Finally, ONS' update rule is 



with 



bi 



E 



1 

to' 



1 

TO 



bt+i ^ nil (-^Ar^Pt) , 



Pt 



1 



t 



, br • Xr ' 

r— 1 



where (3 are the trade-off parameter, (5 is a scale term, and H^* ( • ) is an exact projection 
to the simplex domain. 

ONS iteratively updates the first and second order information and the portfoho 
with a time cost of O (to^), which is irrelevant to the number of historical instances 
t. The authors also proofed ONS's regret bound of O (to log n), which is as tight as 
Cover's UP. 

While FTRL or even the Follow-the-Winner category mainly focuses on the worst- 
case investing, Hazan and Kale 2009, 2012 linked the worst-case model with practically 
widely used aver a ge-cas e investing , that i s , the Geometric Brownian Motion (GBM) 
model iBachehed lll900ll . lOsbornd lll959ll . ICootned lll964ll . which is a probabilistic 
model of stock price returns. The authors also designed an investment strategy that 
is universal in the worst-case and is capable of exploiting the GBM model. Their algo- 
rithm, or so-called Exp-Concave-FTL, follows a slightly different form of optimization 
problem (|7]i with L2-norm regularization, that is. 



1 

bt+i = argmax^log(b ■ ^t) ~ ^ \\hf . 



beA„ 



Similar to ONS, the optimization problem can be efficiently solved via the online con- 
vex optimization technique. The authors further analyzed its regret bound and linked 
it with the GBM model. Linking the GBM model, the regret round is O (to log Q), 
where Q is a quadratic variability, calculated as n — 1 times the sample variance of the 
sequence of price relative vectors. Since Q is typically much smaller than n, the regret 
bound is significantly improved from previous O (m log n). 
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Besides the improved regret bound, the authors also discussed the relationship of 
their algorithm's performance to trading frequency. The authors asserted that increas- 
ing the trading frequency would decrease the variance of the minimum variance CRP, 
that is, the more frequently they trade, the more likely the payoff will be close to the 
expected value. On the other hand, the regret stays the same even if they trade more. 
Consequently, it is expecte d to see improved pe rformance of such algorithm as the 
tra ding frequency increases lAgarwal et al. I i2006ll . 



Das and Baneried 11201 lH further extended the FTRL approach to a generalized 
meta-learning algorithm. Online Newton Update (ONU), which guarantees that the 
overall performance is no worse than any convex combination of the underlying ex- 
perts. 

3.2.5 Aggregating-type Algorithms 

Though BCRP is the optimal strategy for an i.i.d. market, however, this assumption is 
often suspected in real markets, on which the optimal portfolio may not belong to CRP 
or fixed fraction portfolio. Some algorithms have been designed to track a different 
set of experts. The algorithms in this category share the similar expert learning idea 
to the Meta-Learning Algorithms in Section [331 however, here the base experts are of 
a special class, that is, individual expert that invests fully on a single stock, while in 
general Meta-Learning Algorithms often apply to more complex experts from multiple 

cla sses. 

Vovk and WatkinsI lll998ll applied the Aggregating Algorithm (AA) IVovkl il990l 



1997[ 19991 2001 1 to the on-line portfolio selection task, of which Cover's UP is a spe- 



cial case. The general setting for AA is to define a set of base experts and sequentially 
allocate the resource among multiple base experts in order to achieve a good perfor- 
mance that is no worse than any underlying expert. Given a learning rate > 0, a 
measurable set of experts A and distribution Pq assigns the initial weights to the ex- 
perts, AA defines a loss function as I (x, b) and b [9) as an action with respect to 
quantity 9. At each period t — 1^2, ... , AA updates the experts' weights as, 

Pt{A) = / 

J A 

where Pt denotes the weights to the experts at time t. One special case is Cover's 
Universal Portfolios, which corresponds to £ (xf , b) = — log (b • Xi). 



Several further applications have been proposed for the AA algorithm. ISinger 



1 199711 proposed Switching Portfolios (SP), which is a regime-based trading strategy, 
that is, SP switches among a set of strategies corresponding to different regimes. At 
the beginning of each period, SP combines all strategies with the prior distribution to 
construct a portfolio. The author proposed two switching portfolio strategies, both of 
which assume that the duration of using one base strategy is geometrically distributed. 
While the first strategy assumes a fixed distribution parameter, the second assumes 
the distribution of the parameter is dynamically changing with respect to the duration. 
Theoretically, the author further gave the lower bound of the logarithmic total wealth 
achieved with respect to the best of the switching regimes. Empirical results show that 
SP can outperform UP, EG and BCRP. 
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Levina and Shafei i2008ll proposed the Gaussian Random Walk (G RW) strategy, 



which switches among the base experts according to Gaussian distribution. iKozat and Singer 
1 200711 extended SP to piecewise fixed fraction strategies, which partitions the periods 
into different segments and transits among these segments. The authors proofed the 
piecewise universaUty of their algorithm, which can achieve the perform ance of the op- 
timal p iecewise fixed fraction strategy. Kozat and Singer |2008] ex tendedlKozat and Singer 
1 2007ft to the cases of transact ion costs. IKozat and Singeilt2009[ 20 id further gener- 



alized IKozat and Singen 11200711 to sequential decision problem. IKozat et al.l 1200811 
pro posed anoth e r piec e wise universal portfolio selection strategy via context trees 
and lKozat et al.l 11201 ill generalized to sequential decision problem via tree weighting. 

The most interes t ing th ing is that switching portfolios adopts the notion of regime 
switching iHamiltonl 01994 120081, which is different from the underlying assumption 
of universal portfolio selection methods and seems to be more plausible than an i.i.d. 
market . The regime switching is also a pplied to some state-of-the-art trading strate- 



gies |Hardy| l200l|], iMlflaflkeLd 



II2OO9I1 . pproach is 

its distribution assumption, while Geometrical and Gaussian distributions do not seem 
to fit the market well. This leads to other potential distributions that can better model 
the markets. 



3.3 FoUow-the-Loser Approaches 



The underlying assumption for the optimality of BCRP strategy is that market is i.i.d., 
which however does not always hold for the real-world data and thus often results in 
inferior empirical performance, as observed in various previous literatures. Instead of 
tracking the winners, the Follow-the-Loser approach is often characterized by trans- 
ferring the wealth from win ners to losers. The underl y ing assumption underly i ng thi s 
approach is mean r eversi on iBondt and Thaler! 0198511 . IPoterba and SummersI lll988ll . 
Lo and MacKinlav lll990ll . which means that the good (poor)-performing assets will 
perform poor (good) in the following periods. 

To better understand the un derlying assumption of this approach, let us continue 
the example et al.l II2OI2II to show the power of mean reversion. 



Example 3.2 (Virtual market by ' Cover and GlussI lll986ll ). Assume a two-asset mar- 
ket with one cash and one volatile asset and the price relative sequence is x" = 
{(1, 2) , (1, i) , (1, 2) , . . . }. Let us consider the BAH with uniform initial portfolio 
bi = (|, ^) and the CRP with uniform portfolio b = (i. i). As illustrated in Exam- 
ple 13.11 on the same virtual market, market provides no return and CRP results in an 
exponential increasing return. 

Suppose the initial CRP portfolio is (i, i) and at the end of the 1"^' trading day, 
the closing price adjusted wealth proportion becomes (-1,1) with corresponding cu- 
mulative wealth increasing by a factor of |. At the beginning of the 2"^ trading period, 
portfolio manager rebalances the portfolio to initial portfolio (5, ^) by transferring the 
wealth from good-performing stock (B) to poor-performing stock (A). At the begin- 
ning of the 3'^ trading day, the wealth transfer with the mean reversion trading idea 
continues. Though the BAH strategy gains nothing, BCRP can achieve a growth rate 
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of I per two trading periods, which impHcitly assumes that if one stock performs poor, 
it tends to perform good in the subsequent trading period. 



Table 2: Example to illustrate the mean reversion trading idea. 



# Period 


Relative (A,B) 


CRP 


CRP Return 


Portfolio Holdings 


Notes 


1 
2 
3 


(1,2) 
(1,2) 




l\ 

'1 iS 
^2' 2/ 




3 
2 
3 
4 
3 
2 




' 1 2) 

'1 2\ 
y3' 3J 




B—^A 
A^ B 
B ^ A 



3.3.1 Anti Correlation 



Borodin et all2003 , 2004 proposed a Follow-the-Loser portfolio strategy named Anti 



Correlation (Anticor) strategy. Rather than no distributional assumption like Cover's 
UP, Anticor strategy assumes that the market follows the mean reversion principle. To 
exploit the mean reversion property, it statistically makes bet on the consistency of 
positive lagged cross-correlation and negative auto-correlation. 

T o obta i n a po rtfolio for the t + 1"* period, Anticor adopts logarithmic price rel- 



Huil Il2008ll in two specific market windows, that is, yi = log {y^t-2w+i) 



atives 

y2 = log (xj_j„_|_]^) . It then calculates the cross-correlation matrix between yi and y2, 

Mcov (i, j) = r (yi,i - viV (y2,j - m) 

w — 1 

Meo.(^,j)=|^SFfefl) -l(0,-2(j-)^0 

1 otherwise 

Then according to the cross-correlation matrix, Anticor algorithm transfers the wealth 
according to the mean reversion trading idea, that is, moves the proportions from the 
stocks increased more to the stocks increased less, and the corresponding amounts are 
adjusted according to the cross-correlation matrix. In particular, if asset i increases 
more than asset j and their sequences in the window are positively correlated, Anti- 
cor claims a transfer from asset i to j with the amount equals the cross correlation 
value {Mcor {h j)) minus their negative auto correlation values (min {0, Mcor [h *)} 
and min {0, Mcor (j, i)})- These transfer claims are finally normalized to keep the 
portfolio in the simplex domain. 

Since its mean reversion nature, it is difficult to obtain a useful bound such as the 
universal regret bound. Although heuristic and has no theoretical guarantee, Anticor 
empirically outperforms all other strategies at the time. On the other hand, though 
Anticor algorithm obtains good performance outperforming all algorithms at the time, 
its heuristic nature can not fully exploit the mean reversion property. Thus, exploiting 
the property using systematic learning algorithms is highly desired. 

3.3.2 Passive Aggressive Mean Reversion 



Li et al.l 0201211 proposed Passive Aggressive Mean Reversion (PAMR) strategy, which 
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ex ploits the mean reversion property with the Passive Aggressive (PA) onHne learn 



ing lShalev-Shwartz et al.l i2003ll . lCrammer et alj i2006ll . 



The main idea of PAMR is to design a loss function in order to reflect the mean 
reversion property, that is, if the expected return based on last price relative is larger 
than a threshold, the loss will linearly increase; otherwise, the loss is zero. In particular, 
the authors defined the e-insensitive loss function for the i*'* period as. 



4 (b;xt) = 







b • xt < e 



b • Xi — e otherwise 



where < e < 1 is a sensitivity parameter to control the mean reversion threshold. 
Based on the loss function, PAMR passively maintains last portfolio if the loss is zero, 
otherwise it aggressively approaches a new portfolio that can force the loss zero. In 
summary, PAMR obtains the next portfolio via the following optimization problem, 

ht+i = argmin ^ 



|b-bt|r s.t. 4(b;xt)-0. 



(8) 



Solving the optimization problem dHJ, PAMR has a clean closed form update scheme. 



ht+i =ht -Tt (x( - xtl) , Tt ^ max < 0, 



bt ■ xt - e 

||xt - Xtll 



Since the authors ignored the non-negativi ty constraint of the portfolio in the derivation, 
they also added a simplex projection step' Puchi et al. 1 2008 1. The closed form update 
scheme clearly reflects the mean reversion trading idea by transferring the wealth from 
the poor performing stocks to the good performing stocks. Besides the optimization 
problem they also proposed two variants to avoid noise price relatives, by intro- 
ducing some non-negative slack variables into optimization, which is similar to the soft 
margin support vector machines. 

Similar to Anticor algorithm, due to PAMR's mean reversion nature, it is hard to 
obtain a meaningful theoretical regret bound. Nevertheless, PAMR achieves significant 
performance beating all algorithms at the time and shows its robustness along with the 
parameters. It also enjoys linear update time and runs extremely fast in the back tests, 
which show its practicability to large scale real world application. 

The underlying idea is to exploit the single period mean reversion, which is empir- 
ically verified by its evaluations on several real market datasets. However, PAMR suf- 
fers from drawbacks in risk management since it suffers significant performance degra- 
dation if the underlying single period mean reversion f ails to exist. Such drawback 



is cle arly indicated by its performance in DJIA dataset iBorodin et alj 1120031 12004 1 
iLi et al. I 201Z1 . 



3.3.3 Confidence Weighted Mean Reversion 



Li et alj 11201 la l proposed Confidence Weighted Mean Reversion (CWMR) algorithm 
to further exploit the second order portfolio information follo wing the mean rever - 



sion trading idea via Confid e nce Weighted (CW) online learning |Dredzeetal 
Crammer et al] ll2008l liooj . iDredze et~al] ||20 1 Oll . 



1 2008 1 
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The basic idea of CWMR is to model the portfolio vector as a multivariate Gaussian 
distribution with mean /i e K™ and the diagonal covariance matrix S e jjmxm^ 
which has nonzero diagonal elements and zero for off-diagonal elements. While the 
mean represents the knowledge for the portfolio, the diagonal covariance matrix term 
stands for the confidence we have in the corresponding portfolio mean. Then CWMR 
sequentially updates the mean and covariance matrix of the Gaussian distribution and 
draws portfolios from the distribution at the beginning of a period. In particular, the 
authors define \it ^ M (i-M, St) and update the distribution parameters according to the 
similar idea of PA learning, that is, CWMR keeps the next distribution close to the last 
distribution in terms of Kullback-Leibler divergence if the probability of a portfolio 
return lower than e is higher than a specified threshold. In summary, the optimization 
problem to be solved is, 

(Att+i,St+i) = argmin D^l (AA (/i, E) ||AA (/x^, Sj)) 
s.t. Pr [a* • Xf < e] > e. 



To solve the optimization. lLi et al.l 11201 lall transformed the optimization problem using 



two techniques. One transformed optimization problem (CWMR-Var) is, 

(/it+i,St+i)=argmin i (^log (^^^^ + Tr (S^^S)^ + 1 ((^t - m)^ (a** - m)' 

s. t. e - log {n ■ xt) > (/ix^Sxt 
/X • 1 = 1, ^ h 0. 

Note that the log in the constraint is manually substituted to utilize the log utility. 
Solving the above equation, one can obtain the closed form update scheme as, 

Mt+i = - Ai+iSi (— — —] , i;j:^\=St"^ + 2Xt+i(f>xtxJ , 



where At+i corresponds to the Lagrangian multiplier calculated by Eq. (5) in lLi et al 



1 2011a I and xt — ""j^t^'^* denotes the confidence weighted price relative average. 
Clearly, the update scheme reflects the mean reversion trading idea and can exploit 
both the first and second order information of a portfolio vector 

Similar to Anticor and PAMR, CWMR's mean reversion nature makes it hard to 
obtain a meaningful theoretical regret bound for the algorithm. Empirical performance 
show that the algorithm can outperform the state-of-the-arts, including PAMR, which 
only exploits the first order information of a portfolio vector However, CWMR also 
exploits the single period mean reversion, which suffers the same drawback as PAMR. 

3.3.4 On-Line Moving Average Reversion 

Observing that PAMR and CWMR implicitly assume single - period mean rever sion. 



which causes one failure case on real dataset Li et al.l 11201211 . iLi and Hoil 11201211 de- 



fined a multiple-period mean reversion named Moving Average Reversion, and pro- 
posed On-Line Moving Average Reversion (OLMAR) to exploit the multiple -period 
mean reversion. 
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The basic intuition of OLMAR is the observation that PAMR and CWMR implicitly 
predicts next prices as last price, that is, pt+i — Pt-i, where p denotes the price 
vector corresponding the related x. Such extreme single period predictio n may cause 
some drawbacks that caused the failure of certain cases in lLi et alj i2012ll . Instead, the 



authors proposed a multiple period mean reversion, which explicitly predicts the next 
price vector as the moving average within a window. They adopted simple moving 



average, which is calculated as MAj 
next price relative equals 



Xt+l (w) 



MAtH 
Pt 



1 



i—t—w-\-l 

1 



Pi. Then, the corresponding 



(9) 



where w is the window size and denotes element-wise pr oduction. 



Then, they adopted Passive Aggressive online learning iCrammer et al 
learn a portfolio, which is similar to PAMR. 



1 200611 



to 



arg mm 



s.t. b • Xf+i > e. 



Different from PAMR, its formulation follows the basic intuitive of investment, that 
is, to achieve a good performance based on the prediction. Solving the algorithm is 
similar to PAMR, and we ignore its solution. At the time, OLMAR achieves the best 
results among all existing algorithms lLi and Hoi. 1,20121 . especially on certain datasets 
that failed PAMR and CWMR. 



3.4 Pattern-Matching based Approaches 

Besides the two categories of Follow-the-Winner/Loser, another type of strategies may 
utilize both winners and losers, which is based on pattern matching. This category 
mainly covers nonparametric sequential investment strategy series, which guarantee an 
optimal rate of growth of capital, under the minimal assumptions o f stationary and er- 
godic of the financial time series. Based on nonparametric prediction lGvorfi and Schafer 
I 2003|. this category consists of s e veral pa t tern-ma tching based investment strate 



gies lGvorfi et alj l2006[ l2007l l2008ll , iLi et al.l 1201 Ibll. M o reove r, some techniques are 



ll2010ll . iGvorfiet al.1 ll2012ll 



also applied to sequential prediction problem iBiau et al 
collect some related papers in this category. 

No w let us describe the main idea of the Pattern-Matching based approaches lOvorfi et al 
I 2006I1 . which consists of two steps, that is, the Sample Selection step and Portfolio Op- 
timization step. The first step. Sample Selection step, selects a index set C of similar 
historical price relatives, whose corresponding price relatives will be used to predict 
the next price relative. After locating the similarity set, each sample price relative 
Xi,i e C is assigned with a probability Pi,i G C. Existing methods often set the 
probabilities to uniform probability Pi = where |-| denotes the cardinality of a set. 
Besides uniform probability, it is possible to design a different probability setting. The 
second step. Portfolio Optimization step, is to learn an optimal portfolio based on the 
similarity set obtained in the first step, that is, 

bt+i = arg max U (b, C) , 
beA„ 
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Algorithm 2: Sample selection framework (C (x* , u>)). 



Input: X* : Historical market sequence; w: window size; 
Output: C: Index set of similar price relatives. 

1 Initialize C ^ $ ; 

2 ift<w + l then 

3 I return; 

4 end 

5 for z = w + 1, w + 2, . . . , t do 

6 if x^I^ is similar to x*_^_|_]^ then 

7 \ C = CUi; 

8 end 

9 end 



where U {■) is a specified utility function. One particular utility function is the log 
utility, which is always the default utility. In case of empty similarity set, a uniform 
portfolio is adopted as the optimal portfolio. 

In the following sections, we concretize the Sample Selection step in Section [3.4.1l 
and the Portfolio Optimization step in Section [3.4.2l We further combine the two steps 
in order to formulate specific on-line portfoho selection algorithms, in Section [3.4.3l 

3.4.1 Sample Selection Techniques 

The general idea in this step is to select similar samples from historical price relatives 
by comparing the preceding market windows of two price relatives. Suppose we are 
going to locate the price relatives that are similar to next price relative Xf+i. The basic 
routine is to iterate all historic price relatives x^ . i — w + 1, . . . ,t and count x^ as 
one similar price relative, if the preceding market window x^I^ is similar to the latest 
market window The set C is maintained to contain the indexes of similar price 

relatives. Note that market window is a w x m-matrix and the similarity between two 
market windows is often calculated on the concatenated w x m-vector. The Sample 
Selection procedure (C (x^^ , w)) is further illustrated i n AlgorithmlH 



Nonparametric histogram-based sample selection iGvorfi and Schafeii II2003I1 pre 



defines a set of discretized partitions, and partitions both latest market window 
and historical market window x*I™, i — w + 1, . . . ,t, and finally chooses the price 
relatives whose x^I^ is in the same partition as x.l_^^i. In particular, given a partition 
P ~ Aj,j ~ 1,2, ... ,d of R™ into d disjoint sets and a corresponding discretization 
function G (x) — j, we can define the similarity set as, 

CH{A,w) = {w<2<t+l:G = G {^rl) } . 

Nonparametric kernel-based sample selection 'Ovor fi et alj i2006ll identifies the 
similarity set by comparing two market windows via Euclidean distance, that is, 

Ck (x*i,w) = {u. < z <t+l : ||x*_^+i -x^li^ll < 
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where c and I are the thresholds used to control the number of similar samples. Note 
the authors adopted two threshold parameters for theoretica l analysis. 



Nonparametric nearest neighbor-based sample selection lGvorfi et alj 0200811 searches 
the price relatives whose preceding market windows are within the £ nearest neighbor 
of the latest market window in terms of Euclidean distance, that is, 

Cn (xi, w) ^ [w <i <t + l: x*I^ is among the £ NNs of x*_^^;^} , 

where is a threshold parameter 

Correlation-driven nonparametric sample selection iLi et al.l 11201 Ibll identifies the 
linear similarity among two market windows via correlation coefficient, that is. 



Cc (x*, w) ^ Iw <i <t + l 



cov (Xj_^, X^_^,_|_]^) 

std (x^:^) std (xt^„+i) 



> 



where p is a pre-defined correlation coefficient threshold. 



3.4.2 Portfolio Optimization Teclmiques 

The second step of the Pattern-Matching based Approaches is to construct an optimal 
portfolio based on a similar set C. Two main approaches are the Kelly's capital growth 
portfolio and Markowitz's mean variance portfolio. In the following we illustrate sev- 
eral technic|ues adopted in this approaches. 



Gvorfi et a l. [2006] proposed to figure out a log-optimal (Kelly) portfolio based on 



similar price relatives located in the first step, which is clearly following the Capital 
Growth Theory. Given a similarity set, the log-optimal utility function is defined as, 

[/L(b,C(x*)) =E{logb-x|x„i eC(x*)} = -P^logb-x,, 

iGC(x5) 

where Pj denotes the probability assigned to the similar price relative x;, i S G (x|). 
Gvorfi et al. [2006] assumes a uniform probability among the similar samples, thus it 
is equivalent to the following utility function, 

t/L(b,C(x*)) = logb-x,. 

iGC(x5) 

iGvorfi et al.l ll2007ll introduced semi-log-optimal strategy, which approxima tes log 
in the log-optimal utility function aiming to release the computational issue, and 'Vajdal 
fe006] presented theoretical analysis and proofed its universality. The semi-log-optimal 
utility function is defined as, 

C/s(b,C7(x*)) =E{/(b.x)|x„zeC(x'i)} = ^^/(b-x.), 

iGC(x5) 

where / (•) is defined as the second order Taylor expansion of log z with respect to 
— 1, that is, 

f{z) = z-l-\{z-lf. 
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Gvorfi et alj ll2007ll assumes a uniform probability among the similar samples, thus, 
equivalently, 

Us{h,C{^\))= /(b-x.). 

iec(x*) 



Ottucsak and Vaidal 11200711 proposed nonparametricMarfeow/fz-fy/^e strategy, which 



is a further generalization of the semi-log-optimal strategy. The basic idea of the 
Markowitz-type strategy is to represent the portfolio return using Markowitz's idea 
to trade off between portfolio mean and variance. To be specific, the Markowitz-type 
utility function is defined as, 

Um (b, C (x*i)) =E |b • X X,, i e C | - AVar |b • x x„ i e C (x*i) | 
=E|b-x x,,i e C(x*)| - AE|(b -x)^ x„ i e C(x*)| 



+A (e |b • X x,,^ e C(x*)}y 



where A is a trade-off parameter. In particular, simple numerical transformation shows 
that semi-log-optimal portfolio is an instance of the utility function with a specified A. 

To solve the problem with trans action costs, Gvorfi and Vajda |2008] propose a GV- 
type utihty function (Algorithm 2 in lOvorfi and Vaidal 1I2OO8I1 . their Algorithm 1 follows 
the same procedure as log-optimal utility) by incorporating the transaction costs, as 
follows, 

Ut (b, C (x* ) ) = E {log b • X + log u; (bt , b, xt )} , 

where w (•) G (0, 1) is the transaction cost factor, which represents the remaining 
proportion after transaction costs imposed by the market. The details calculation of the 
factor are illustrated in Section IZTI According to a uniform probability assumption of 
the similarity set, it is equivalent to calculate, 

f/T(b,C(x*)) = (logb-x + log«;(bt,b,xt)), 

iec(x'i) 

In any above procedure, if the similarity set is non-empty, we can gain an optimal 
portfolio based on the similar price relatives and their probability. In case of empty set, 
we can choose either uniform portfolio or previous portfolio. 



3.4.3 Combinations 

In this section, let us combine the first step and the second step and describe the detail 
algorithms in the Pattern-Matching based approaches. Table|3]shows existing combina- 
tions, which blanks mean that there is no existing algorithm to exploit the combination. 

One default utihty fu nction is the log-optimal function or the BCRP portfolio. 
Gvorfi and S chafer! Il2003ll introduced the nonpammetric histogram-based log-optimal 
investment strategy (B^), which combines the histogra m-based sample sel ection and 
log-optimal utility function and proofed its universality. iGvorfi et al.l 0200611 presented 
nonpammetric kernel-based log-optimal investment strategy (B^), which combines 
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Table 3: Pattern-Matching based approaches: sample selection and portfolio optimiza- 



tion. 





Sample Selection Techniques 


Portfolio Optimization 


Histogram 


Kernel 


Nearest Neighbor 


Correlation 


Log-optimal 
Semi log-optimal 
Markowitz-type 
GV-type 


B": Ch + Ul 


B^:Ck+Ul 
B^:Ck + \Js 
B'^:Ck + Um 
BGV: c,^ + Ur 


B^^^^iCat + Ul 


CORN: Cc + Vl 



the kernel -based sampl e selection and log-optimal utility function and proofed its uni- 
versality. iGvorfi et alj 12008,1 proposed nonparametric nearest neighbor log-optimal 
investment strategy (B'^'^), which combines the nearest ne ighbor sample s election 
and log-optimal utility function and proofed its universality. Li et al.l 11201 Ibll created 
correlation-driven nonparametric learning approach (CORN) by combining the cor- 
relation driven sample selection and log-optimal utility function and showed its supe- 
rior empirical performance over previous three combinations. Besides the log-optimal 
utility function, several a lgorithms using different utility functions have been pro- 
posed. iGyorfi et al.l 0200711 proposed nonparametric kernel-based semi-log-optimal in- 
vestment strategy (B^) by combining the kernel-based sa mple selection and semi-log- 
optimal utility function to ease the computation of (B^). lOttucsak and Vajdal 1200711 
proposed nonparametric kernel-based Markowitz-type investment strategy (B'^) by 
combining the kernel-based sample selection and Markowitz-type utility function to 
make t rade-offs between the re turn (mean) and risk (variance) of expected portfolio 
return. iGvorfi and Vaidal 11200811 proposed nonparametric kernel-based GV-type invest- 



ment strategy (B*^^) by combining the kernel-based sample selection and GV-type 
utility function to construct portfolios in case of transaction costs. 



3.5 Meta-Learning Algorithms 

Another category of resea rch in the area of on-hne portfolio selection is the Meta- 
Learning Algorithm (MLA) Das and Banerjee f20T7l , which is closely related to expert 
learning Cesa-Bian chi a nd Lugosi [2006J in the machine learning community. This is 
directly applicable to "Fund of fund", which delegates its portfolio asset to other funds. 
In general, MLA assumes several base experts, either from same strategy class or dif- 
ferent classes. Each expert outputs a portfolio vector for the coming period, and MLA 
combines these portfolios to form a final portfolio, which is used for next rebalance. 
MA algorithms are similar to algorithms in "Follow-the-Winner" approaches, however, 
they are proposed to handle a broader class of experts, which CRP can serve as one spe- 
cial case. On the one hand, MLA system can be used to smooth the final performance 
with respect to all underlying experts, especially when base experts are sensitive to cer- 
tain environments/parameters. On the other hand, combining a universal strategy and 
a heuristic algorithm, which is not easy to obtain a theoretical bound, such as Anticor, 
etc., can provide the universality property of whole MLA system. Finally, MLA is able 
to combine all existing algorithms, thus provides much broader area of application. 
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3.5.1 Aggregating Algorithms 



Beside s the algorithms dis c ussed in Section [3.2.5l Aggregating Algorithm (AA) I Vovk 
II1990II . Vovk and Watkins 1 1998 1 can also include more sophistic base experts. Its 
update of the experts is the same as previously described, thus, we do not dig into the 
details again. 



3.5.2 Fast Universalization 



Akcoglu et all2002l l2004|proposed Fast Universalization (FU), which extends Cover's 



Universal PortfoUos ICoverl 11199111 from parameterized CRP class to a wide class of 
investment strategies, including trading strategies operating on a single stock and port- 
foho strategies allocating wealth among whole stock market. FU's basic idea is to 
evenly split the wealth among a set of base experts, let these experts operate on their 
own, and finally pool their wealth. FU's update is similar to that of Cove's UP, and it 
also asymptotically achieves wealth that equals an optimal fixed convex combination 
of base experts' wealth. In cases that all experts are CRPs, FU would downgrade to 
Cover's UP 

Besides the universalization in the continuous parameter space, various discrete buy 
and hold combinations have been adopted by various existing algo rithms. Rewritten in 
its discrete form, its update can be obvious obtained. For example. Boro din et all2003 



I2OO4. adopted BAH st rategy to combine Anticor experts with a finite number of window 



sizes. 



Li et al. I 1I2OI2I1 combined PAMR experts with a finite number of mean reversion 



thresholds. Moreover, all Pattern-Matching based approaches in Section [j4l used BAH 
to combine their underlying experts, also with a finite number of window sizes. 



3.5.3 Online Gradient & Newton Updates 



Das and Baneried 11201 1 1 proposed two meta optimization algorithms, named Online 
Gradient Update (OGU) and Online Newton Update (ONU), which are natural ex- 
tensions of Exponential Gradient (EG) and Online Newton Step (ONS), respectively. 
Since their updates and proofs are the similar to their precedents, here we ignore their 
updates. Theoretically, OGU and ONU can achieve the growth rate as the optimal con- 
vex combination of the underlying experts. Particularly, if any base expert is universal, 
then final meta system enjoys the universality property. Such property is useful since 
a meta-learning algorithm can incorporate a heuristic algorithm and a universal algo- 
rithm, and the final system can enjoy the performance while keeping the universality 
property. 



3.5.4 Follow the Leading History 

Hazan and Seshadhri II2OO9II proposed a Follow the Leading History (FLH) algorithm 



for changing environments. FLH can incorporate various universal base experts, such 
as ONS algorithm. Its basic idea is to maintain a working set of finite experts, which 
are dynamically flowed in and dropped out according to their performance, and allo- 
cate the weights among the active wor king experts with a meta - learni ng algorithm, for 
example, Herbster-Warmuth algorithm iHerbster and WarmuthI 1199811 . Different from 
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other meta-learning algorithms with experts operate from the same beginning, FLH 
adopts experts starting from different periods. Theoretically, FLH algorithm with uni- 
versal methods is universal. Empirically, FLH equipped with ONS can significantly 
outperform ONS. 



4 Connection with Capital Growth Theory 

Most on-line portfolio selection algorithms introduced above can be interestingly con- 
nected to the Capital Growth Theory. In this section, we first introduce the Capital 
Growth Theory for portfolio selection, and then connect the previous algorithms to the 
Capital Growth Theory in order to reveal their underlying trading principles. 



4.1 Capital Growth Theory for PortfoUo Selection 

Originated for gambling. Capital Growth Theory (CGT'^Hakan sson and Z iemb? fl995'l_ 
( also t ermed Kelly investment Kellv 1 1956] or Growth Optimal P ortfolio (GOP) Gvorfi et 



1 2OI2II ) can generally be adopted for on-line portfolio selection. Breiman Ill96l | ] gener- 



ahzed Kelly criterion to multiple investment opportunities. Thorpi Iil97li1 and lHakansson 



1 197111 focused on the theory of Kelly criterion or logarithmic utility for portfolio se- 
lection problem. Now let us briefly introduce the theory for portfolio selection. 

The basic procedure of CGT is to maximize the expected log return for each pe- 
riod. It involves two related steps, that is, prediction and portfolio optimization. For 
prediction step, CGT receives the predicted distribution of price relative combinations 
ict+i = . . . , Xt+i,m), which can be obtained as follows. For each invest- 

ment i, one can predict a finite number of distinct values and corresponding probabil- 
ities. Let the range of Xt+i^i be {r^ 1 ; . . . ; jv; } : * = 1, ■ ■ ■ ,m, and corresponding 
probability for each possible value j be pij. Based on these predictions, one can 
estimate their joint vectors and corresponding joint probabilities. In this way, there 
are in total n"=i possible prediction combinations, each of which is in the form 
^ik^M,---M ^ [xt+i^i = ri^ki andxt+1^2 = r2^k2 and . . . andxt+i^m = r,n,k^] 
with a probability of pi>^iM,---,km) _ YY^^iPj^kj- Given these predictions, CGT can 
tries to obtain an optimal portfolio that maximizes the expected log return, that is, 

Elog5 = log (b . x(^-'=--'='")) 

= [p^'^''^-"''"^log(6iri,fe, +---+6™r™,fe„) 

where the summation is over all YiTLi price relative combinations Obviously, maxi- 
mizing the above equation is concave i n b an d thus can be efficiently solved via convex 



optimization .Bovd and Vandenbergha 1200411 



4.2 On-Line Portfolio Selection and Capital Growth Theory 

Most existing on-line portfolio selection algorithms have close connection with the 
Capital Growth Theory. While the Capital Growth Theory provides a theoretically 
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Table 4: On-line portfolio selection and the Capital Growth Theory. 



Algorithms 


Xt+l 


Prob. 


Capital growth forms 


BCRP* 


1 ^ " 1, . . . , 


l/n 


bf+i = argmaXbgA„. i EILi ' ^» 


EG 

PAMR 

CWMR 

OLMAR 


Xt 

1 

1 

Eq. (ID 


100% 
100% 
100% 
100% 


bt+i = argmaXbeA„, logb • xj - XR{h,ht) 
bt+i = argmini,gA,„ h ■ xt + XR (b, bf) 
bt+i = argmin^gA™ P (b ' ^t) + -^-^ (b, bj) 
bt+i = argmaxi,gA„, b • xt+i - Ai? (b, bf) 


B"/B^/B^^^^/CORN 

gGV 


Xi, i e Cf 
Xj,i e Ct 


l/|Ct| 

y\Ct\ 


bf+i = argmaXbeA„. Ejec* log^ • x, 

bt+i = argmaXbgA„ E»eCt (logb • x, -I- log w (•)) 


FTL 
FTRL 


1 ^ — 1, . . . , ^ 

5 ^ — 1, . . . , i 


i/t 
i/t 


bf+i = argmaXbgA„ 7 ELi logb ' x, 

bf+i = argmaXbgA™ j J2l=i logb • x^ - Ai? (b) 



guaranteed framework for portfolio selection, on-line portfolio selection algorithms 
mainly connect to the capital growth theory from two aspects, illustrated as follows. 

For the first connection, we assume that market sequence (price relative vectors) is 
i.i.d. Then according to the Capital Growth Theory, the optimal portfolio achieving the 
best cumulative performance belongs to a fixed fraction portfolio, which constitutes the 
CRP strategy class, and the best cumulative wealth is achieved by BCRP in hindsight, 
as illustrated in Section 13.1.31 We further rewrite the BCRP strategy in the form of 
Capita l Growth Theory, as shown in the first row of Table|4] Thus, pioneered bv lCoveJ 
il99lll . the first connection tries to asymptotically achieve the same exponential growth 
rate as that of BCRP in hindsight. The gap between the two algorithms is also termed 
regret illustrated by Eq. (01). An algorithm with asymptotically zero regret is further 
called a universal portfolio selection strategy. 

The above connection is mainly pursued by the FoUow-the- Winner approaches. In 
particular, the first four algorithms in the category, that is. Universal Portfolios, Expo- 
nential Gradient, Follow the Leader, and Follow the Regularized Leader, all give regret 
bounds, which asymptotically approaches zeros as trading period goes to infinity. In 
other words, these algorithms can achieve the same exponential growth rate as the CGT 
optimal portfolio in hindsight. While the Aggregating-type Algorithms extend on-line 
portfolio selection from the CRP class to other strategy class, which m ay not be CGT 



optimal. This connection also coincides with the competitive analysis in lBorodin et al 
112000 ]. 

Besides implicitly targeting the exponential growth rate of BCRP in hindsight, the 
second connection explicitly adopts the idea of Capital Growth Theory for on-line port- 
folio selection. For each period, one algorithm requires the predicted price relative 
combinations and their corresponding probabilities. Without explicitly stating, we as- 
sume to make portfolio decision for the t+1** period. In Table 21 we summarize the 
algorithms in the forms of capital growth theory. We present their implicit market dis- 
tributions, denoted by their values (xt+i) and probabilities (Prob.). We also rewrite 
all these algorithms (capital growth forms), that is, to maximize the expected log re- 
turn for the t + 1** period, in the fourth column. The regularization terms are denoted 
as R (b, bt), which preserves the information of last portfolio vector (bt), and R (b), 
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Table 5: On-line portfolio selection and their underlying trading principles. 



Principles 


Algorithms 


Xt+l 


Prob. 


IE{xt+i} 


Momentum 


EG 




100% 


Xt 




FTL/FTRL 


Xj , Z — 1 , . . . , ^ 


lA 


I Xi 


Mean reversion 


CRP/UP 


n/a 


n/a 


n/a 




Anticor 


n/a 


n/a 


n/a 




PAMR/CWMR 


j_ 

Eq. (ig 


100% 


j_ 

Eq. ^ 




OLMAR 


100% 


Others 


B"/B^/B^^^VB^VCORN 


Xj, i e Ct 


1/iai 






AA 


n/a 


n/a 


n/a 



which constrains the variability of a portfolio vector Based on the number of predic- 
tions, we can categorize most existing algorithms into the following three categories. 

The first category, including EG/PAMR/CWMR/OLMAR, predicts a single sce- 
nario with certainty, and tries to build an optimal portfolio. Note that the capital growth 
forms of PAMR and CWMR are rewritten from their original forms, however, their es- 
sential ideas are kept. Moreover, PAMR, CWMR, and OLMAR ignore the log utility 
function, since adding log utility follows the same idea, but causes convexity issues. 
Though such prediction seems risky, the three algorithms all adopt regularization terms, 
such as i? (b, bf ) ~ ||b — bt||^, to keep the information of last portfolio, such that next 
portfolio is not far from last one, which in deed reduces the risk. 

The second category, including the Pattern-Matching approaches, predicts multiple 
scenarios that are similar to next price relative vector In particular, it expects next 
price relative to be x^, i e C with a uniform probability of j^, where C denotes the 
similarity set. Then, the category tries to maximize its expected log return in terms of 
the similarity set, which is consistent with the Capital Growth Theory and results in an 
optimal fixed fraction portfolio. Note that several algorithms in the Pattern-Matching 
approaches, including B^, B'^, and B^^, adopt different portfolio optimization ap- 
proaches, which we do not list here. 

The third category, including FTL and FTRL, predicts the next scenario using all 
historical price relatives. In particular, it predicts that the next price relative vector 
equals x^, i = 1, . . . ,t with a uniform probability of j. Based on such prediction, 
this category of strategies aims to maximize the expected log return, and minuses a 
regularization term for FTRL. Note that different from the regularization terms in the 
first category, regularization term in this category, such as R (b) — || b|| ^, only contains 
the deviation of next portfolio. This can be attributed to the fact that the predictions 
already contain all available information for the portfolio selection task. 

4.3 Underlying Trading Principles 

Besides the aspect of the Capital Growth Theory, most existing algorithms also follow 
certain trading ideas to predict their next price relatives. We summarize their implicit 
trading idea via three trading principles, that is, momentum, mean reversion, and others 
(for example, nonparametric prediction), in Table |5] Note that the expected vector 
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(E {xt+i}) is element- wise operation. 

Momentum stra tegy Cha n et al. J 1 99611 ,'Rouwenhorst) 1*1 998*1 jMoskowitz and Grin blat3 
lll999ll . lLee and Sw aminathanl ll2000ll .]George and Hwang 120041, Cooper et al. 120041 
assumes winners (losers) will still be winners (losers) in the following period. By 
observing algorithms' underlying prediction schemes, we can classify EG/FTL/FTRL 
in this category. While EG assumes that next price relative will be the same as last 
one, FTL and FTRL assume that next price relati ve is expected to be the histori - 
cal average price rela t ive. Mean reversion st r ategy 'Bondt and Thalei^ i 198511 1987ll . 
Poterba and Summersllll988ll . |jegadee"shllll99lll . lChau dhuri and Wu 120031, on the other 



hand, assumes that winners (losers) will be losers (winners) in the following period. 
Clearly, CRP and UP, Anticor, and PAMR/CWMR belong to this category. Here, it 
is worth noting that UP is an expert combination of CRP strategy, and we classify it 
via its implicit assumption of the underlying stocks. If we observe from the experts' 
perspective, UP transfers wealth from CRP losers to CRP winners, which is actually 
momentum on CRP strategy. Moreover, PAMR and CWMR's expected price relative 

is explicitly the inverse of last price relative, which is on th e opposite of EG. 

Other trading ideas, including nonparametric prediction lGvorfi and Schafeiill2003ll . 



Biau et al.l 11201 Oil based on the Pattern-Matching approach, cannot be classified by the 
above two categories. For example, for the Pattern-Matching approaches, the average 
of the price relatives in a similarity set may be regarded as either momentum or mean 
reversion. Besides, the classification of AA depends on the type of underlying experts. 
From experts' perspective, AA always transfers the wealth from loser experts to winner 
experts, which is momentum strategy on the expert level. From stocks' perspective, 
which is the assumption in Table |5] the classification of AA coincides with that of its 
underlying experts. That is, if the underlying experts are single stock strategy, which 
is a momentum strategy, then we view AA's trading idea as momentum. On the other 
hand, if the underlying experts are CRP strategy, which is a mean reversion strategy, 
we regard AA's trading idea as mean reversion. 



5 Challenges and Future Directions 

On-line portfolio selection task is an important generalization of asset management. 
Though existing algorithms perform well theoretically or empirically in back tests, 
researchers have encounter several challenges in designing the algorithms. In this sec- 
tion, we focus on the two subsequent steps in the on-line portfolio selection, that is, 
prediction and portfolio optimization. In particular, we illustrate open challenges in 
the prediction step in Section ISTl and list several other challenges in the portfolio op- 
timization step in Section 15.21 There are a lot of opportunities in this area and and it 
worths further exploring. 

5.1 Accurate Prediction by Advanced Techniques 

As we analyzed in Section l4~2l existing algorithms implicitly assume various prediction 
schemes. While current assumptions can result in good performance, they are far from 
perfect. Thus, the challenges for the prediction step are often related to the design of 
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more subtle prediction schemes in order to produce more accurate predictions of the 
price relative distribution. 

— Searching patterns. In the Pattern-Matching based approach, in spite of many 
sample selection techniques introduced, efficiently recognizing patterns in the finan- 
cial markets is often challenging. Moreover, existing algorithms always assume uni- 
form probability on the similar samples, while it is an challenge to assign appropriate 
probability, hoping to predict more accurately. 

— Utilizing stylized facts in returns. In econometrics, there exists a lot stylized 
facts, which refer to some so consistent empirical findings that they are accepted as 
truth. One stylized fact is related to the autocorrelations in various assets' return^ . It 
is often observed that s ome stocks/indices s how positive daily/weely/monthly autocor- 
relations |^mi i 1 970ll . lLo"and MacKiniav[ 119991. Llorente et al. 1 2002,1, while some 
others have negative daily autocorrelations Lo and MacKinlavl 1 198811199011 . Ijegadeesh 
111 9901 . An open challenge is to predict the price relatives by utilizing the autocorrela- 
tions. 



Utilizing stylized facts in absolute/square returns. Another stylized fact lTavlor 



1 200511 is that the autocorrelations in absolute and squared returns are much higher 
than those in simple returns. Such fact indicates that there may be consistent nonlinear 
relationship within the time series, which various machine learning techniques may 
be used to boost the prediction accuracy. However, in current prediction step, such 
information is rarely efficiently exploit, thus constitutes a challenge. 

— Utilizing calendar effects. It is well kno wn that there exist some calendar effects, 

such as January effect or turn-of-the-yeareffectlRozeff and Jr. J I976ll .^Haugen and Lak qnishok 



1 1987], Moller and Zilca |2008], holiday effect Fields 1 1934|],lBrockman and MichayluJ^ 
il 99811 . IDzhabarov and Ziembal 1120 1 Oil , etc. No existing algorithm exploits such infor- 
mation, which can be potentially provide accurate predictions. Thus, another open 
challenge is to take advantage of these calendar effects in the prediction step. 

— Exploiting additional information. Almost all existing prediction schemes focus 
solely on price relative (or price), there exists much o ther useful side information , 
such as volume, fundamental, and experts' opinions, etc. ICover and Ordentlich Ill996ll 
presented a preliminary model to incorporate some of them, which is however far from 
applicability. Thus, it is an open challenge to incorporate other sources of information 
to facilitate the prediction of next price relatives. 



5.2 Open Issues of Portfolio Optimization 

Portfolio optimization is the subsequent step for on-line portfolio selection. While the 
Capital Growth Theory is effective in ma ximizing final cumulative wealth, which is the 
aim of this task, it often incurs high risk ThorpI 1 1997 1. which is also quite important 
for an investor Open issues in this category often associate with risk concern, which is 
not taken into account in current target. 



— Incorporating risk. Mean Variance theory iMarkowitzl Ill952ll adopts variance 
as a proxy to risk. However, simply adding a variance may not efficiently trade off 
between the return and risk. Thus, one challenge is to exploit an effective proxy to risk 



Econometrics community often adopts return, wliicli equals price relative minus one. 
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and efficiently trade off between return and risk, in the scenario of on-line portfolio 
selection. 

— Utilizing "optima l /". One recent ad vancement in money management is "opti- 
mal/'' IMqcIM |l99l llM IIqO^ which is proposed to handle the drawbacks 
of Kelly's theory. Optimal / can reduce the risk of Kelly's approach, however, it re- 
quires an additional expected drawback, which is hard to estimate. Thus, this poses 
one challenge to explore the power of optimal/ and efficiently incorporate to the area 
of on-line portfolio selection. 

— Loosing portfolio constraints. Current portfolio is constrained in a simplex, 
which means the portfolio is self-financed and no margin/short. Besides current long- 
only portfoli o, there also exist long/short portfolios, which allow short and margin. 



Coven 11199111 proposed a proxy to evaluate an alg orithm when margin is al lowed, by 



adding an additional component for each asset. ICover and OrdentlichI 1 1998 1 proposed 
one universal method to exploit the portfolio when margin and short are allowed. How- 
ever, current methods are still in their infancy, and far from application. Thus, the 
challenge is to develop effective algorithms when margin and short are allowed. 

— Extending transaction costs. To make an algorithm practical, one has to con- 
sider s ome practical issues, such as tran s action costs, etc. Though sever al theoretical 
models Blum and Kalai Ill999ll . Ilvengarl ll2005ll . iGvorfi and Vaidal ll2008ll considering 
transaction costs have been proposed, they can not be explicitly conveyed in an algo- 
rithmic perspective, thus hard to understand. Thus, one challenge is to extend current 
algorithms to the cases when transaction cost is taken into account. 

— Extending market liquidity. Another consideration is market liquidity. Although 
all algorithms claim that in the back tests they choose blue chip stocks, which has 
the highest liquidity, this can not solve the concern of this issue, unless they test their 
algorithms in paper trading or real trading, which is much harder. Besides, no algorithm 
has ever considered this issue in its algorithm formulation. The challenge here is to 
accurately model the market liquidity, then design efficient algorithms. 



6 Conclusions 

This article conducted a survey on the on-line portfolio selection task, an interdisci- 
plinary topic of machine learning and finance. With the focus on the algorithmic as- 
pects, we began by formulating the task as a sequential decision learning problem, and 
further categorized the existing on-line portfolio selection algorithms into five major 
groups: Benchmarks, FoUow-the-Winner, Follow-the-Loser, Pattern-Matching based 
approaches, and Meta-Learning algorithms. After presenting the surveys of the indi- 
vidual algorithms, we further connected these existing algorithms to the Capital Growth 
Theory from two aspects, to better understand the essentials of their underlying trading 
ideas. Finally, we outlined some open challenges for further investigations. We note 
that, although quite a few algorithms have been proposed in literature, many open re- 
search problems remain unsolved and deserve further exploration. We wish this survey 
article could facilitate researchers to understand the state-of-the-art research in this area 
and potentially inspire their further research. 
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